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Notation: 


We  employ  a  orthogonal  Cartesian  coordinate  system  with:  y-axis  along  the  shoreline,  x- 
axis  pointing  towards  the  shore  and  z-axis  pointing  upward.  The  wave  propagation  is 
assumed  to  be  towards  positive  x-axis.  As  there  are  conflicting  conventions,  we 
frequently  include  explicit  definition  of  dimensionality  of  equations  and  models:  one¬ 
dimensional  equation  or  code  refers  to  one  spatial  dimension  i.e.  the  model  variables  are 
functions  of  x  and  t  only,  with  similar  meaning  for  two-dimensional  versions.  Note  that 
the  wave  surface  predicted  by  a  two-dimensional  model  is  three-dimensional  i.e.  it 
includes  the  height  information. 


Introduction: 

The  omnipresence  of  waves  has  led  to  many  experimental  and  theoretical  studies  to 
predict  and  simulate  the  water  waves  and  their  effects  under  various  conditions  as  diverse 
as  an  open  sea  in  a  hurricane,  the  surf  on  a  beach,  waves  in  ports  and  harbors,  waves  in 
canals  etc.  This  is  a  problem  with  enormous  scope  with  many  classes  of  physical 
phenomena.  Therefore,  the  scope  of  this  SBIR  project  is  restricted  to  long  waves  in 
shallow  water  appropriate  for  applications  to  wind  generated  near-shore  waves.  All 
modem  models  of  this  type  start  with  the  governing  equations  for  fluid  flows  i.e.  the 
Navier-Stokes  equations.  It  is  possible  to  include  many  physical  effects  in  these  models. 
The  SBIR  solicitation  identified  three  classes  of  models  for  this  problem:  models  based 
on  Navier-Stokes  equation  such  as  Volume  Of  Fluid  (VOF)  model,  the  Green-Naghdi 
type  models  and  Boussinesq  type  models.  We  use  these  terms  to  denote  classes  of 
models  rather  than  a  single  model.  We  also  note  that  all  of  these  models  are  non-linear. 
Until  reeently,  most  of  the  work  was  based  on  linear  approximations.  Of  course,  the 
linear  approximations  limit  the  range  of  phenomena  that  can  be  modeled.  The  recent 
advances  in  computers  have  allowed  the  study  of  many  of  the  effects  of  non-linearity. 
This  in  turn  has  led  to  a  flurry  of  activity  and  advances  in  non-linear  science. 

As  noted  in  the  SBIR  solicitation,  there  are  many  potential  applications  of  this 
simulation,  both  military  and  non-military.  An  immediate  application  of  the  simulation  is 
to  the  planning  of  military  exercises  and  operations  such  as  amphibious  landings.  This 
simulation  is  very  general  in  nature,  and  can  be  used  in  most  applieations  involving 
coastal  structures  and  processes. 


Selection  of  a  Model: 

We  begin  with  a  discussion  of  the  selection  of  a  model.  As  anticipated  in  our  Phase  I 
proposal,  we  propose  a  solution  based  on  the  Boussinesq  type  models  The  VOF  type 
models  use  the  full  three-dimensional  form  of  the  Navier-Stokes  equation.  Both  Green- 
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Naghdi  and  Boussinesq  models  are  specifically  adapted  for  the  water  wave  problem  and 

reduce  the  dimensionality  using  approximation  on  the  z-dependence.  We  now  summarize 

the  reasons  behind  our  selection  of  the  model. 

•  While  the  VOF  type  models  based  on  full  Navier-Stokes  equation  are  capable  of 
providing  the  most  detailed  and  accurate  information,  they  are  also  most  resource 
intensive.  This  is  primarily  due  to  the  three-dimensional  (x,  y,  z  and  t)  nature  of  these 
equations.  For  a  comparable  domain  size  and  resolution,  we  estimate  that  the 
resource  (memory  and  time)  requirements  of  these  models  are  hundreds  or  thousands 
of  times  greater  than  that  of  a  Boussinesq  type  model.  Moreover,  due  to  the  added 
dimension,  the  resource  requirements  grow  at  a  faster  rate  than  Boussinesq  type 
models  as  the  resolution  or  the  domain  size  grows.  We  estimate  that  the  technology 
required  to  adequately  simulate  the  wave  climatology  of  a  region  such  as  Camp 
LeJeune  is  many  years  away. 

•  The  Green-Naghdf  model  was  derived  using  the  theory  of  fluid  sheets.  Green  and 
Naghdi’  did  not  use  a  perturbation  approach  and  thereby  avoid  any  explicit  smallness 
(e.g.  wave  height)  assumption.  However,  they  imposed  (in  our  opinion, 
unacceptable)  restrictions  on  velocity  components:  the  vertical  component  is  linear  in 
z,  and  the  horizontal  components  are  independent  of  z.  Another  criticism  of  the 
Green-Naghdi  model  is  that  it  fails  to  preserve  vorticity.  An  attractive  improvement 
was  derived  by  Demirbilek  and  Webster'*’’  by  allowing  general  polynomial 
dependence  of  velocity  components  on  z.  They  presented  the  equations  and  a 
numerical  code,  GNWAVE,  for  one-dimensional  version  of  Green-Naghdi  Level  II 
theory.  This  code  was  provided  to  us  and  indeed  appears  to  work  under  wider 
circumstances  than  our  one-dimensional  Boussinesq  code.  However,  even  in  the  one¬ 
dimensional  case,  these  equations  are  very  complicated  and  their  derivation  requires 
involved  algebraic  manipulations.  We  estimate,  based  on  our  earlier  experience  with 
symbolic  algebra  systems,  that  the  derivation  of  the  equations  of  a  two-dimensional 
Level  II  Green-Naghdi  model  will  be  extremely  difficult  (possibly  unmanageable) 
even  with  symbolic  algebra  packages  like  MACSYMA,  Mathematica  or  Maple. 
Furthermore,  the  complexity  of  the  equations  will  impose  significant  precision 
requirement  on  the  numerical  methods.  These  facts  combine  to  substantially  reduce 
the  advantage  of  lower  dimension  over  VOF  type  models.  Due  to  these  reasons,  there 
appears  to  be  very  little  work  in  this  area,  in  particular,  there  is  essentially  no 
comparisons  with  experimental  data.  We  estimate  that  the  development  of  such  a 
model  is  significantly  outside  the  scope  of  a  SBIR  project. 

•  The  first  modem  Boussinesq  type  model  was  derived  by  Peregrine'*  using  depth- 
averaged  velocities  and  perturbation  expansions.  The  approximations  are  valid  for 
long  waves  in  shallow  water  and  the  assumption  of  weak  non-linearity  limits  the 
wave  height  and  steepness.  In  particular,  this  model  cannot  be  used  in  deep  waters  or 
in  the  break  zone.  For  surface  water  waves,  this  method  provides  great  deal  of 
simplification  with  relatively  little  loss  of  accuracy.  By  its  very  nature,  Boussinesq 
models  provide  limited  information  on  the  velocity  components.  Two  closely  related 
non-linear  equations:  the  Kadomtsev-Petviashvili  (KP)  and  the  Korteweg-de  Vries 
(KdV)  equations  have  enormous  theoretical  interest  as  they  are  examples  of 
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Completely  Integrable  Hamiltonian  systems.  Due  to  these  reasons,  this  has  been  and 
continues  to  be  an  area  of  active  research.  As  can  be  expected  with  this  degree  of 
research,  many^’  improvements  of  Peregrine’s  original  work  has  been 

proposed.  The  modifications  proposed  by  Nwogu'^  improves  the  dispersion  relation 
and  extends  the  domain  of  validity  to  very  close  to  the  deep  water  limit  where  Stokes 
expansions  can  be  used.  Nwogu’s  equations  uses  the  velocity  at  fixed  distance  from 
the  still  water  level  rather  than  the  depth  averaged  values.  Our  Phase  I  work  is  based 
on  Nwogu’s  version  of  the  Boussinesq  model.  We  use  the  numerical  methods 
proposed  by  Wei  and  Kirby^^.  Several’  improvements  or  enhancements  of 

Nwogu’s  model  have  been  proposed.  We  selected  Nwogu’s  form  of  the  model  for  the 
Phase  I  work  because  of  it’s  general  acceptance  as  a  base  model  and  it’s  relative 
simplicity.  We  intend  to  include  the  fully  non-linear  version(s)  in  Phase  II.  Both 
Nwogu'^,  and  Wei  and  Kirby^^  showed  good  agreement  between  experimental  data 
and  the  model  prediction.  We  performed  comparisons  of  the  predicted  time-series 
data  with  some  data  from  the  CERC  elliptical  shoal  experiment  of  Vincent  and 
Briggs^',  with  very  good  agreement.  All  this  research  provides  a  very  strong 
validation  for  these  models.  In  fact,  one  of  the  reasons  of  our  selection  of  Boussinesq 
type  model  is  the  availability  of  research  results,  without  which  the  progress  made 
during  Phase  I  would  have  been  impossible  and  work  proposed  in  Phase  II  would  not 
be  possible  either.  We  emphasize  that  the  proposed  work  is  neither  a  duplication  nor 
a  substitute  of  the  academic  research  that  is  being  performed  at  various  Universities 
and  Institutes.  Lastly,  we  have  studied  the  resource  requirements  of  our  code.  Based 
on  all  of  these  works,  our  conclusion  is:  for  the  present  and  the  near  future, 
Boussinesq  models  provide  the  best  option  in  providing  reasonably  accurate 
predictions  on  field-scale  problems  with  the  available  resources.  Of  course, 
eventually  (in  many  years)  models  like  VOF  using  full  Navier-Stokes  equations  will 
be  manageable  for  field-scale  problems  and  will  be  the  model  of  choice  at  that  point. 


Numerical  Codes: 

During  Phase  I,  we  developed  prototype  codes  to  numerically  solve  one  and  two- 
dimensional  time-dependent  Boussinesq  models.  We  consider  a  section  of  the  ocean  in 
the  computational  domain  0 <  x  < a  and  0<  y<b,  and  the  initial  state  is  assumed  to  be 
quiescent.  We  wish  to  predict  time-dependent  three-dimensional  form  of  the  wave  in  this 
domain  that  is  generated  by  a  wave-train  incident  along  the  boundary  at  x  =  0 .  Of 
course,  the  waves  depend  on  the  incident  wave  and  the  bathymetry  data.  The 
computational  domain,  the  incident  wave  and  the  bathymetry  data  must  fall  within 
limitations  of  the  model;  long  waves  in  shallow  water.  The  boundaries  at  x  =  a ,  y  =  0 
and  y  =  b  are  assumed  to  be  transparent  to  the  wave  i.e.  we  assume  “open  boundary 
condition”  along  these  edges. 

As  noted  earlier  we  use  Nwogu’s  equations'^  for  the  Phase  I  work.  This  model  extends 
the  validity  on  the  earlier  versions  of  Boussinesq  models  and  is  simpler  than  some  other 
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versions.  Thus,  this  model  provides  a  convenient  starting  point.  In  fact,  the  model 
prediction  agrees  quite  well  with  several  experimental  situations.  We  use  the  numerical 
method  proposed  by  Wei  and  Kirby^^.  This  is  a  finite  difference  method  with  a  fourth 
order  predictor-corrector  time  stepping  scheme.  We  follow  this  method  very  closely*. 
This  method  is  described  in  detail  in  Appendix  A,  and  we  refer  the  reader  to  the  paper  by 
Wei  and  Kirby^^  and  the  report  version  of  the  paper  for  further  discussion.  Some  of  the 
detail  (e.g.  a  flow  diagram)  in  the  report  version  was  omitted  from  the  journal 
publication.  As  is  well  known,  the  implementation  of  the  open  or  absorbing  boundary 
condition  poses  several  difficulties.  Lack  of  time  has  forced  us  to  take  some  shortcuts  in 
this  respect.  There  are  several  problems  in  implementing  the  simplest  form  of  the 
radiation  boundary  condition  r],  +  c  cos  =Q\  the  celerity  c  is  ill-defined  in  this 
dispersive  situation  and  in  the  two-dimensional  case  the  direction  is  not  known  a  priori. 
We  therefore  used  a  simplified  version  ri,-\-cT]^=Q  both  in  one  and  two-dimensional 
cases  along  the  x  =  a  boundary  with  some  choice  of  the  celerity  c .  For  the  >■  =  0  and 
y  =  b  boundaries  in  the  two-dimensional  version,  we  have  used  an  even  more  primitive 
condition  that  y-derivatives  vanish  along  these  boundaries.  The  conditions  imposed  on 
the  y  -boundaries  are  appropriate  if  there  are  no  incident  waves  at  these  boundaries,  a 
condition  that  is  at  least  approximately  satisfied  in  all  our  examples.  Our  code  has 
instability  caused  by  the  waves  trapped  by  reflection  at  the  boundaries.  This  limits  the 
time  interval  that  can  be  used  in  our  simulations.  We  also  observe  that  these  instabilities 
are  present  in  the  GNWAVE  code  as  well.  The  expected  solution  (see  Wei  and  Kirby^^ 
Nwogu'^’ Abbott,  McGowan  and  Warren’,  Engquist  and  Majda^,  Isralei  and  Orszag", 
Larsen  and  Dancy'\  Longuet-Higgins  and  Cokelet''*)  is:  (i)  use  a  higher  order  radiation 
condition  and  (ii)  implement  absorption  layers  at  these  boundaries.  We  have  started  the 
implementation  of  the  higher  order  radiation  condition  (see  equation  (42)  of  Wei  and 
Kirby^^)  for  the  two-dimensional  code,  however  it  remains  to  be  debugged.  It  should  be 
noted  here  that  another  commonly  used  boundary  condition,  the  reflection  boundary 
condition,  appropriate  for  a  canal,  a  gate  or  a  wall  etc.,  is  simpler  to  implement  and  works 
quite  well. 

We  have  implemented  a  simple  platform-independent  animation  scheme  based  on  the 
public  domain  graphics  packages  Plplot  or  Pgplot.  Similar  animation  was  added  to  the 
GNWAVE  as  well.  These  codes  provide  near  real  time  solution  and  visualization  of  the 
time-dependent  wave  in  the  one-dimensional  case  even  in  a  Pentium  based  PC.  The  “on 
the  fly”  animation  is  very  slow  (in  a  PC)  for  the  two-dimensional  case  with  a  reasonable 
resolution.  As  soon  as  the  basic  debugging  was  completed,  we  emphasized  the  case 
studies  rather  than  any  improvements  of  the  code.  In  particular,  we  did  not  have  time  to 
implement  the  absorption  layers  or  even  the  pre-factorization  of  the  matrices  in  the  two 
dimensional  case.  We  were  able  to  perform  only  a  small  number  of  case  studies  in  this 
short  period  of  time.  However,  there  are  many  more  case  studies  in  the  literature. 


*  After  we  completed  the  one-dimensional  version  of  the  code,  the  one-dimensional  code  developed  by  Wei 
and  Kirby  was  publicly  available.  Due  to  lack  of  time,  we  have  not  yet  been  able  to  study  the  code 
extensively.  However,  we  noticed  a  few  minor  differences  in  the  implementation  e.g.  the  boundary. 
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The  reflections  due  to  the  simplified  treatment  of  boundaries  leads  to  trapped  waves  and 
instabilities.  The  trapped  waves  and  the  instabilities  are  visible  (this  applies  to 
GNWAVE  as  well)  in  the  animation  when  the  execution  continues  significantly  past  the 
arrival  of  the  wave-front  at  the  near-shore  boimdary.  This  instability  makes  it  difficult  to 
perform  long-time  simulations  necessary  for  accurate  statistical  analysis  of  waves  with 
complex  spectrum.  Thus,  we  have  not  attempted  any  tests  with  incident  waves  with 
several  spectral  components.  We  have  developed  some  prototype  post-processing 
components  for  the  visualization  of  snapshot  of  the  waves  and  the  statistical  analysis  of 
the  time-series  data.  We  now  describe  some  simulations  performed  with  these  codes. 


Remarks  on  the  code: 

The  code  is  written  in  FORTRAN.  We  use  several  common  extensions  to  the  Fortran  77 
standard,  these  extensions  are  included  in  the  Fortran  90  standard.  We  have  tested  the 
code  with  Microsoft  Fortran  Powerstation  compiler,  f2c/fcc  or  g77  under  Linux,  Sun 
Fortran,  Cray  Fortran.  The  extensions  are  used  to  improve  the  readability,  but  may 
require  modifications  in  certain  computer-compiler  combination.  These  extension 
include:  (i)  use  of  lower  case  letters,  (ii)  variable  names  may  include  the  underscore 
character  or  longer  than  six  characters,  (iii)  the  do  loops  are  terminated  with  end  do, 
(iv)  a  NAMELIST  group  is  used  for  input  (v)  the  two  dimensional  code  uses  REAL* 8. 
The  code  should  use  at  least  8-byte  real  numbers,  double  precision  on  the  PC,  Sun,  etc., 
and  single  precision  on  the  Cray  etc. 

We  practise  meaningful  variables  names,  and  avoid  the  use  one  or  two  character  variable 
names  except  for  loop  index  etc.  We  use  eta  for  the  water  displacement,  velx  and 
vely  for  the  two  components  of  the  velocity.  We  use  a  prefix  wk_  for  the  variables 
used  in  Wei  and  Kirby^^,  e.g.  wk_bl  refers  to  6,  in  equation  (11)  in  Appendix  A  or 
wk_u  refers  to  U  defined  in  equation  (4)  of  Appendix  A.  We  generally  avoid  global 
variables  and  use  of  many  common  blocks.  We  use  one  common  block  for  the  model 
parameters  and  constants,  defined  in  separate  file  and  “included”  in  the  code.  Another 
common  block  is  used  for  the  input  wave  parameters.  All  arrays  that  depend  on  the 
resolution  are  declared  in  the  main  program.  Any  work  space  needed  by  a  subroutine  is 
passed  to  it  as  an  argument. 

We  use  some  graphics  based  on  the  public  domain  packages  Pgplot  or  Plplot.  We 
include  dummy  (no  output  is  generated)  graphics  routines  if  graphics  libraries  are  not 
available  or  graphics  is  not  desired.  We  use  a  few  Linpack  routines.  Optimized  Linpack 
routines  are  available  in  many  machines.  Again  we  include  sources  in  case  the  Linpack 
libraries  are  not  available. 

The  input  is  obtained  from  a  NAMELIST  group  in  a  file,  the  variables  in  this  grout  are 
described  in  sample  input  files  and  “readme”  files.  The  output  consists  of  the  displayed 
graphics  (if  any),  time  series  data  at  selected  locations,  and  snapshots  of  the  solution  at 
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particular  instants  of  time.  The  code  also  displays  the  CPU  time  information  for  the  run 
if  a  working  version  of  the  function  second  is  available  for  the  system 

The  code  essentially  follows  the  flow  chart  given  in  the  report  version  of  Wei  and  Kirby. 
We  begin  by  defining  a  default  set  of  values  of  the  parameters.  The  input  quantities  are 
then  read  in.  The  NAMELIST  read  does  not  require  all  variables  to  be  specified.  If  some 
variable  is  not  included  in  the  input,  it  retains  the  default  value.  We  then  call  the 
subroutine  initialize  which  (i)  sets  up  the  initial  conditions,  (ii)  computes  the 
bottom  profile  and  its  derivatives,  (iii)  wavelengths  of  the  incident  waves  etc.  We  then 
initialize  the  graphics  system  and  the  work-space  variables.  Next  the  execution  enters  the 
main  time  loop  which  repeatedly  call  the  subroutine  time  step  and  updates  the  current 
time  until  the  final  time  is  reached.  At  some  time  steps  we  may  perform  some  graphics 
operations;  or  write  the  time  series  or  snapshot  data.  The  subroutine  time  step 
implements  the  predictor-corrector  scheme  described  in  Appendix  A  for  one  time  step. 
We  also  use  a  filter  to  control  the  high  frequency  oscillations.  A  snapshot  data  is  saved  at 
the  end  of  the  run  and  the  graphics  system  is  closed. 

One-dimensional  Boussinesq  code: 

First  simulation  performed  was  the  propagation  of  a  solitary  wave  over  a  flat  bottom. 

Wei  and  Kirby^^  derived  an  approximate  analytical  solution: 
u=  A  sech^  [5(x  -  Ct)] 

T]  —  sech^  [5(x  -  Q)]  +  A2  sech"*  [B(x  -  Q)] 

to  Nwogu’s  equations.  Figures  la  and  lb  show  a  comparison  of  numerical  and  the 
analytical  solutions.  Figure  lb  is  the  magnified  version  of  a  portion  of  figure  la.  The 
close  agreement  between  the  two  provides  a  strong  validation  of  the  code. 

Next  we  consider  an  example  by  Nwogu:  the  propagation  of  a  monochromatic  wave  with 
period  7’  =  0.85  s  and  height  77=  0.05  m  propagating  in  a  channel  of  uniform  depth  of 
0.56  m.  The  results  are  shown  in  figure  2.  Since,  the  wavelength  is  approximately  twice 
the  depth,  this  case  demonstrates  the  extended  validity  of  the  model.  Figure  2  shows  both 
the  computed  solution  and  the  incident  wave.  The  approximate  agreement  between  the 
two  comes  from  the  fact  that  the  incident  wave  is  a  solution  of  the  linearized  equation, 
and  small  height  of  the  waves.  Increasing  the  wave-height  shows  the  effects  of  non¬ 
linearity. 
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Q  Solitary  wave:  t  =  200,  nx  =  4500,  dx  =  0.1,  dt  =  0.025 


Figure  la.  Comparison  of  numerical  (solid)  and  analytical  (dash)  solitary  wave  solution. 


^  Solitary  wave:  t  =  200,  nx  =  4500,  dx  =  0.1,  dt  =  0.025 


Figure  lb.  A  detailed  comparison  of  the  numerical  (solid)  and  analytical  solitary  wave  solution. 
Figures  la  and  lb  refer  to  the  same  solution,  i.e.  figure  lb  is  a  magnified  section  of  figure  la. 
The  water  depth  in  this  example  is  0.45  m. 
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Tj=.85s,  Hj=.05m.  ci  =  .56m;  t=40.  dx=.01,  dt=.0025 


Figure  2.  The  solid  curve  depicts  the  numerical  solution  of  the  Boussinesq  equation 
corresponding  to  the  incident  wave  shown  in  dash.  The  incident  wave  is  a  solution  of  the 
linearized  Boussinesq  equation,  so  it  is  an  approximate  solution  of  the  Boussinesq  equation. 

We  conducted  test  runs  with  bathymetry  of  the  CERC  elliptical  shoal  experiment  of 
Vincent  and  Briggs^*.  We  used  both  our  code  and  the  Green-Naghdi  model  of 
Demirbilek  and  Webster^  Both  of  these  codes  show  the  general  features  of  the  wave  up- 
to  the  shoal,  but  they  fail  to  reproduce  the  features  of  the  wave  past  the  shoal.  This  can 
be  expected,  since  the  one-dimensional  model  does  not  include  refraction  and  diffraction 
of  the  waves. 


Two-dimensional  Boussinesq  code: 

When  a  new  model  is  proposed,  it  is  judged  by  a  variety  of  techniques.  A  common 
practice  is  to  compare  the  model  with  the  predictions  of  the  linearized  equations.  While 
such  agreement  is  necessary,  usually  it  not  a  sufficient  condition  for  the  usefulness  of  the 
model.  Ultimately,  for  physical  models,  the  validity  must  be  based  on  comparison  with 
physical  measurements.  Indeed,  most  papers  in  this  area  contain  some  comparison  of 
experimental  data  and  the  model  predictions.  The  Boussinesq  models  agree  well  with 
such  measurements.  In  Phase  I,  we  were  able  to  make  such  a  comparison  with  one  case 
of  the  CERC  elliptical  shoal  experiment  of  Vincent  and  Briggs^'.  Any  comparison  with 
this  experiment  and  a  time-dependent  model  has  not  been  reported  in  the  literature.  The 
time-series  data  from  the  experiment  were  provided  by  Dr.  Zeki  Demirbilek,  Mr.  Mike 
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CERC  Elliptical  Shoal  (Case  Ml:  T;  =  1.3s,  H,  =  0.061m) 
t  =  30.0  nx  =  550  ny  =  250  dx  =  .060  dy  =  .091 


Figure  3.  The  computed  water  surface  at  time  30  sec.  for  the  CERC  elliptical  shoal  bathymetry 
and  a  monochromatic  incident  wave.  The  z-axis  shows  the  displacement  from  the  still  water 
surface.  The  shoal  center  is  located  at  (6.096,  13.72).  The  effects  of  refraction  and  diffraction  is 
clearly  visible  in  the  figure. 


Briggs  and  Ms.  Deborah  Green  of  CERC.  The  agreement  between  the  model  and  the 
experiment  is  very  good.  We  propose  to  perform  more  extensive  comparison  of  the 
models  with  available  laboratory  experiments  and  field  measurements  during  Phase  II. 

The  experiment  (CERC  elliptical  shoal  experiment)  by  Vincent  and  Briggs^'  was 
conducted  at  the  U.  S.  Army  Engineer  Waterways  Experiment  Station’s  Coastal 
Engineering  Research  Center  (CERC).  The  shoal  tests  were  conducted  in  CERC’s  35  m 
wide  by  29  m  long  directional  spectral  wave  basin.  The  bathymetry  was  uniform  with  a 
maximum  variation  of  9  mm  (0.35  in.).  The  elliptical  shoal  had  a  major  radius  of  3.96  m 
(13  ft),  minor  radius  of  3.05  m  (10  ft),  and  a  maximum  height  of  30.48  cm  (1  ft).  All 
tests  were  run  in  45.72  (1.5  ft)  of  water  depth,  so  that  the  distance  at  the  center  of  the 
shoal  to  still  water  level  was  15.24  cm  (0.5  ft).  The  waves  were  generated  with  the 
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directional  spectral  wave  generator  (DSWG).  The  wave  measurements  were  taken  at 
several  gage  locations  on  a  rectangular  grid. 

The  case  considered  here  is  the  one  labeled  as  Case  ID:  Lab  M31  or  Zeki  Ml.  This  case 
corresponds  to  a  monochromatic  input  wave  with  period  1.30  sec.  The  incident  wave- 
height  of  2.56  in  (0.065  m)  was  inferred  from  the  measurements  at  gage  #10  (see  figure  5 
for  the  locations  of  the  gages).  After  analyzing  the  time-series  data  at  these  gage 
locations  with  a  post-processing  component  developed  during  Phase  I,  we  concluded  that 
the  time-series  could  be  considered  to  be  consistent  with  a  range  of  values  for  the  wave- 
heights,  and  therefore  decided  to  perform  the  simulations  with  a  range  of  values.  To  date 
we  have  performed  simulations  with  only  one  set  of  parameters.  This  simulation  has 
been  run  with  several  sets  of  resolutions  {n^,dx,ny,dy,dt)  and  on  several  computers. 
The  consistency  of  the  results  is  a  necessary  step  in  the  validation  of  the  code.  We  have 
performed  some  other  consistency  checks  as  well. 


CERC  Elliptical  Shoal  (Case  Ml:  T]  =  1.3s,  H;  =  0.061m) 
t  =  40.0  nx  =  550  ny  =  250  dx  =  .060  dy  =  .091 


0 


Figure  4.  The  computed  water  surface  at  40  sec  for  the  CERC  elliptical  shoal  bathymetry. 
Figure  4,  shows  the  same  wave  as  in  figure  3,  but  at  a  later  time. 
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Elliptical  Shoal  (Case  Ml:  T|  =  1.3s,  Hj  =  0.061m) 
40.0  nx  =  550  ny  =  250  dx  =  .060  dy  =  .091 


Figure  5.  A  contour  plot  of  the  computed  water  surface  for  the  CERC  elliptic  shoal  bathymetry 
at  time  40  sec  i.e.  the  surface  in  figure  4,  is  shown  in  this  plot.  The  solid  lines  correspond  to  still 
water  level,  the  long-dash  contours  correspond  to  water  level  below  still  water  i.e.  the  troughs, 
and  the  short  dash  contours  correspond  to  above  still  water  level  i.e.  the  crests.  The  contours  are 
shown  at  1  cm  interval  i.e.  the  contour  lines  correspond  to  water  displacements  of  0,  ±1,  ±2,. .cm. 
The  solid  ellipses  are  contour  lines  of  the  shoal.  The  effects  of  retraction  and  diffraction  on  the 
shoal  are  clearly  visible.  The  solid  circles  denote  the  gage  locations.  Starting  form  the  bottom 
of  the  figure,  in  each  vertical  row,  the  gages  are  labeled  as  Gage  #9,  5,  1,  10  respectively.  The 
vertical  rows  of  gages  are  referred  to  as  Transects  1-5. 


In  figures  3  and  4,  we  present  the  computed  three-dimensional  wave  surface  at  time  30 
and  40  sec.  respectively.  In  figure  5,  we  present  a  contour  plot  of  the  wave-surface.  For 
reference,  we  show  some  contour  lines  of  the  shoal  and  the  locations  of  the  gages:  Gage 
#9  corresponds  to  y  =  10.668  m  (bottom  row  in  the  figure).  Gage  #5  corresponds  to 
y  =  13.716m  (second  row  from  bottom  in  the  figure),  Gage  #1  corresponds  to 
y  =  16.764  m  (third  row  from  bottom  in  the  figure)  and  Gage  #10  corresponds  to 
y  =  21.336  m  (top  row  in  the  figure).  We  use  the  term  Transect  1,  Transect  5  for 
labeling  down-wave  locations  of  10,  20,  50  ft  respectively  (vertical  rows  in  the  figure). 

Thus,  for  example,  Gage  5  Transect  3  refers  to  the  location  x  =  9.144  m  y  =  13.716  m. 
We  follow  the  notation  used  in  the  data  files  provided  to  us.  Please  observe  that  the 
effects  of  refraction  and  diffraction  over  the  mound  is  clearly  visible  in  figures  3-5.  The 
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simplified  bovindary  condition  imposed  in  our  code  assumes  no  waves  are  incident  on  the 
y-boundaries.  It  is  also  clear  from  these  figures  that  this  condition  is  at  least 
approximately  satisfied.  The  effects  of  reflection  on  the  boxmdaries  is  visible  in  the 
computed  solution  at  sufficiently  later  time. 

In  figures  6-10,  we  compare  the  experimental  and  the  computed  time-series  data  at  the 
gage  locations.  We  need  to  match  the  phases  of  the  two  sets  of  waves  for  this 
comparison.  Since,  the  simulation  starts  in  quiescent  state,  the  comparison  is  meaningful 
only  after  the  wave  “attains  the  full  wave-height”.  To  this  end,  we  look  at  the  plots  of  the 
computed  time-series  and  then  match  the  phases  by  comparing  the  experimental  time- 


CERC  Shool:  Case  Ml  Tronsect  1  -  Expt.  (solid),  Boussinesq  Model  (dash) 
Gage  1  Gage  5 


Gage  9 


Gage  10 


Figure  6.  A  comparison  of  the  measured  time-series  and  the  computed  time-series  at  the 
gages  along  transect  1.  The  measured  time-series  is  plotted  as  a  solid  curve.  The  computed 
time-series  (dash)  is  shifted  by  14.66  sec.  The  shift  in  the  computed  time-series  selected  by 
matching  the  phases  of  the  two  curves  for  Gage  #10.  This  shift  can  be  ehanged  by  a  period 
(1.3  sec.)  of  the  waves.  The  shift  is  chosen  to  be  large  enough  so  that  the  eomputed  wave  has 
"achieved  full  height”  or  “reached  a  steady  state”  for  all  gages. 

series  at  gage  #10,  transect  1  with  the  corresponding  computed  time-series  but  starting  at 
different  times.  Thus  we  superimpose  the  measured  time-series  (solid  lines)  and  the 
computed  time-series  (dashed  lines)  starting  at  14.66  sec  in  figures  6-10.  Note  that  this 
will  be  a  meaningful  comparison  if  the  starting  time  for  the  data  collection  were  same  for 
all  the  gages.  In  which  case  this  comparison  matches  not  only  the  individual  time-series, 
but  the  correlation  as  well  i.e.  it  provides  a  comparison  of  the  entire  time-dependent  wave 
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surface  at  these  locations.  Observe  that  the  agreement  in  height,  shape  and  phase  is  very 
good  for  gages  #1,  #5  and  #9  for  all  transects.  However,  there  is  significant  disagreement 
in  the  phases  of  gage  #10  in  transects  2,  3  and  5.  This  is  due  to  the  lack  of 
synchronization  in  the  starting  time  for  the  data  collection  in  these  gages.  This 
conclusion  follows  from  a  comparison  of  the  measured  gage  #10  time-series  data  at  all 
the  transects.  The  matching  of  the  phases  of  gage  #10  in  transect  4  is  accidental,  the  time 
difference  is  approximately  an  integral  multiple  of  the  period. 


CERC  Shoal:  Case  M1  Transect  2  —  Expt.  (solid),  Boussinesq  Model  (dash) 
Gage  1  Gage  5 


Gage  9  Gage  10 


Figure  7.  Comparison  of  measured  and  computed  time-series.  The  offset  used  for  the  computed 
series  is  same  for  all  transects  and  gages.  The  phase  mismatch  for  gage  #10  is  due  to  different 
starting  times  for  the  measured  series. 
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CERC  Shoal:  Casa  W1  Transect  3  —  Expt.  Csolld),  Boussinesq  Model  (dash) 


Gage  1 


0.08 


e  0.02 


2  4  6  8  10  12  14 

t  (sec),  Model  shift:  14.66 


Gage  5 


Gage  9 


Gage  1  O 


Figure  8.  Comparison  of  measured  and  computed  time-series.  The  offset  used  for  the  computed 
series  is  same  for  all  transects  and  gages.  The  phase  mismatch  for  gage  #10  is  due  to  different 
starting  times  for  the  measured  series. 


CERC  Shool:  Case  Ml  Transect  4  —  Expt.  (solid),  Boussinesq  Model  (dash) 
Gage  1  Gage  5 


Gage  9  Gage  10 


Figure  9.  Comparison  of  measured  and  computed  time-series.  The  offset  used  for  the  computed 
series  is  same  for  all  transects  and  gages.  The  phases  for  gage  #10  for  this  transect  matches  since 
the  difference  of  starting  time  between  transect  1  and  4  is  nearly  an  integral  multiple  of  the 
period. 
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CERC  Shoal:  Case  M1  Transect  5  —  Expt.  Csolid),  Bousstneeq  Model  (dash) 


Gage  1 


Gage  5 


Gage  9 


Gage  10 


Figure  10.  Comparison  of  measured  and  computed  time-series.  The  offset  used  for  the 
computed  series  is  same  for  all  transects  and  gages.  The  phase  mismatch  for  gage  #10  is  due  to 
different  starting  times  for  the  measured  series. 
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Transect 


Experimental  time— series  for  Gage  10 


(solid)  (Sc  2  (dash) 


(solid)  8c  3  (dash) 


Transect  1  (solid)  8c  4  (dash)  Transect  1  (solid)  8c  5  (dash) 


I 


Figure  11  In  this  figure,  we  compare  the  gage  #10  measures  time-series  for  transect  1  with  that 
of  transects  2-5.  None  of  the  time  series  were  shifted. 
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Experimental  appropriately  translated  time— series  for  Gage  10 
Transect  1  (solid)  Sc  2  (dash)  Transect  1  (solid)  <Sc  3  (dash) 


Transect  1  (solid)  &c  4  (dash) 


Transect  1  (solid)  Sc  5  (dash) 


Figure  12  In  this  figure,  we  compare  the  gage  #10  measured  time-series  for  transect  1  with  that 
of  transects  2-5.  The  time-series  in  transects  2-4  were  shifted  by  an  amount  of  that  required  by 
the  wave  to  travel  from  transect  1  to  the  transect  being  compared.  We  use  the  computed  value  of 
the  celerity  for  this  purpose. 


Numerical  time  — series  for  Gage  1  O 


Transect  1  (solid)  Sc  2  (dash) 


Transect  1  (solid)  Sc  3  (dash) 


Transect  1  (solid)  Sc  4  (dash) 


Transect  1  (solid)  Sc  5  (dash) 


Figure  13  In  this  figure,  we  compare  the  gage  #10  numerical  time-series  for  transect  1  with  that 
of  transects  2-5.  None  of  the  time  series  were  shifted. 
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Transect  1 


Numerical  appropriately  translated  time— series  for  Gage  10 
(solid)  3c  2  (dash)  Transect  1  (solid)  Sc  3  (dash) 


Transect  1  (solid)  8c  4  (dash) 


Transect  1  (solid)  Sc  5  (dash) 


Figure  14  In  this  figure,  we  compare  the  gage  #10  computed  time-series  for  transect  1  with  that 
of  transects  2-5.  The  time-series  in  transects  2-4  were  shifted  by  an  amount  of  that  required  by 
the  wave  to  travel  from  transect  1  to  the  transect  being  compared.  We  use  the  computed  value  of 
the  celerity  for  this  purpose. 


We  compare  in  figures  11  and  12  the  measured  time  series  of  gage  #10  for  different 
transects  in  order  to  determine  it  the  data  collection  for  the  gage  #10  for  different 
transects  started  at  different  instants  of  time.  In  figure  11,  we  compare  the  time  series 
without  any  shift,  leading  to  a  near  perfect  match.  This  match  in  figure  1 1  is  consistent 
with  a  simultaneous  start  of  data  collection  in  gage  #10  for  different  transects  only  if  the 
separation  between  the  transects  is  an  integral  multiple  of  the  wavelength.  The 
experimental  wavelength  is  not  known,  but  the  computed  wavelength  does  not  satisfy  this 
condition.  In  figure  12,  we  compare  the  time  series  in  transect  1  with  the  time  series  in 
transect  2(3,  4,  5)  shifted  by  an  amount  of  time  equal  to  the  time  required  by  the  wave  to 
travel  from  transect  1  to  transect  2(3,  4,  5).  The  curves  in  figure  12  should  match  if  the 
data  collection  started  simultaneously.  The  computed  celerity  was  used  in  figure  12.  The 
computed  celerity  should  be  a  good  approximation  to  the  physical  value.  We  note  that  a 
match  could  not  be  obtained  by  using  any  nearby  value  of  the  celerity.  The  match 
between  transect  1  and  4  reflects  the  fact  that  time  required  by  the  wave  to  travel  from 
transect  1  to  transect  4  is  approximately  four  times  the  period.  A  similar  test  was 
performed  on  the  computed  time  series  with  the  results  shown  in  figures  13  and  14.  As 
expected,  the  agreement  of  the  nixmerical  curves  are  consistent  with  the  simultaneous 
beginning  of  the  time  series.  We  conclude  that  data  collection  in  gage  #10  at  different 
transects  started  at  the  respective  downward  zero-crossing,  rather  that  at  the  same  instant 
of  time. 
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Gage  #9 

Gage  #5 

Gage  #1 

Gage  #10 

ExptTI 

Code  T1 

0.0727 

0.0625 

0.0579 

0.0630 

0.0651 

0.0623 

0.0634 

0.0621 

ExptT2 

Code  T2 

0.0718 

0.0595 

0.0895 

0.0838 

0.0671 

0.0600 

0.0666 

0.0621 

Expt  T3 

Code  T3 

0.0475 

0.0476 

0,1081 

0.1367 

0.0419 

0.0449 

0.0644 

0.0659 

ExptT4 

Code  T4 

0.0495 

0.0547 

0.1364 

0.1393 

0.0477 

0.0583 

0.0651 

0.0718 

Expt  T5 

Code  T5 

0.0689 

0.0640 

0.1087 

0.1130 

0.0617 

0.0567 

0.0653 

0.0728 

Table  1.  Comparison  of  the  measured  and  computes 
significant  wave-heights. 


In  table  1,  we  present  a  comparison  of  the  significant  wave-heights  at  the  gages.  Note 
that  this  calculation  pertains  to  a  single  gage  location  and  is  unaffected  by  the  lack  of 
synchronization  of  the  starting  times.  In  this  comparison,  one  must  consider  the  accuracy 
of  the  measured  values.  We  do  not  have  any  information  for  the  experimental  accuracy, 
but  can  hazard  a  guess  that  it  is  at  least  a  few  percent.  This  error  estimate  is  based  partly 
on  the  comparison  between  the  gage  #1  and  gage  #9  wave-heights  which  should  be  same 
due  to  the  symmetry.  The  accuracy  of  the  numerics  is  estimated  to  be  in  the  range  1-5% 
for  transects  1-4,  and  slightly  higher  for  transect  5.  We  therefore  conclude  that  the 
agreement  between  the  measurement  and  the  simulation  is  very  good.  The  largest 
difference  (about  26%)  occurs  for  gage  #5  and  transect  3,  for  a  large  wave-height.  Thus 
we  anticipate  that  the  agreement  will  become  even  better  when  a  fully  non-linear  version 
of  Boussinesq  model  is  used. 
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Uniform  Slope:  T|=9.7s,  d  =  100ft  at  x  =  0,d  =  10ft  for  x  >  4500 
t  -  200.0  nx  -  501  ny  =  201  dx  -  10.0  dy  -  15.0 


Figure  11.  An  example  near  field-scale  simulation  on  a  PC.  For  this  example,  we  consider  a 
beach  with  a  uniform  slope  of  1/50  along  the  x-direction.  The  water  depth  is  taken  to  be  100  ft  at 
the  beginning  of  the  domain.  The  minimum  water  depth  is  10ft.  The  incident  wave  has  period 
of  9.7  sec  and  the  wave-height  is  1  ft. 

This  last  case  study  is  intended  to  illustrate  the  feasibility  of  simulating  a  field-scale 
problem.  For  the  purposes  of  this  illustration  we  consider  a  wind  generated  wave  with 
period  9.7  sec.  and  a  beach  with  a  uniform  slope  of  1/50.  Figure  1 1  shows  this  computed 
wave  surface  at  200  sec.  The  simulation  was  performed  on  a  PC  and  covered  an  area 
approximately  one  mile  by  one-half  mile.  The  resolution  used  can  be  considered  to  be 
moderate,  this  simulation  could  have  been  done  with  a  lower  resolution,  but  a  more 
complex  incident  wave  will  require  higher  resolution.  As  we  argue  in  the  next  sub¬ 
section,  much  larger  scale  simulations  can  be  run  on  more  powerful  computers  currently 
in  the  market,  and  significant  improvements  are  anticipated  in  Phase  II. 


The  resource  requirements: 


A  crucial  aspect  of  the  feasibility  is  a  match  between  the  resource  requirement  and  the 
availability.  By  resources  we  mean  computer  memory  commonly  referred  to  as  the  RAM 
and  execution  or  CPU  time.  The  memory  requirement  depends  on  the  number  of  grid 
points  NX-NY  used  in  the  computation.  The  current  version  of  the  codes  are  written  in 
FORTRAN  and  does  not  use  any  compiler  extension  for  dynamic  memory  allocation. 
Thus  the  “code  size”,  the  memory  required  by  the  code,  depends  on  the  declared 
maximum  values  for  these  grid  points.  If  the  execution  is  performed  with  less  grid  points 
than  the  declared  maximum,  a  portion  of  the  code  space  is  not  accessed,  only  the 
frequently  accessed  portion  of  the  memory  is  important  to  us.  Most  modem  operating 
systems  use  virtual  memory,  typically  implemented  by  using  some  slower  permanent 
memory,  the  hard-disk  or  a  device  like  SSD.  In  order  for  the  code  to  execute,  the  code 
size  must  be  less  than  the  virtual  memory  limit.  However,  if  the  portion  of  the  memory 
that  needs  to  be  accessed  frequently  (e.g.  it  refers  to  an  array-element  that  is  being  used) 
does  not  fit  in  the  available  RAM,  a  phenomena  known  as  “thrashing”  occurs  due  to  the 
need  to  constantly  access  the  slower  hard-disk.  For  all  practical  purposes  the  code  is 
inoperable  if  thrashing  occurs.  Thus  the  maximum  number  of  grid  points  that  can  be 
used  depends  on  the  available  core  memory  (RAM).  The  time  span  of  the  simulation  is 
limited  by  the  available  CPU  time.  The  time  limitation  is  frequently  unrelated  to 
computer  issues,  but  effectively  must  be  smaller  than  the  mean  time  between  failures 
from  all  causes. 

Therefore,  we  conducted  some  performance  analysis  of  the  codes.  The  memory 
requirement  is  linear  in  the  number  of  grid  points  NX-NY  used  in  the  computation.  The 
time  required  is  approximately  linear  in  NX-NY-NT  with  NT  being  the  number  of  time 
steps.  We  verified  this  linear  behavior  in  some  examples.  Since  the  time  stepping 
scheme  is  iterative  in  nature,  the  CPU  time  may  be  substantially  longer  in  some  cases. 
The  tests  were  performed  on  several  computers,  some  of  the  results  are  shown  in  Table  2. 
The  two  PCs  Abbot  and  Abbot2  are  at  or  slightly  below  the  bottom  of  the  line  PCs 
currently  on  the  market.  All  the  runs  described  in  the  earlier  sections  were  performed  on 
Abbot2.  The  Cray  is  also  outdated.  The  Sun  workstation  is  a  high-end  workstation 
currently  on  the  market.  For  each  computer/compiler  combination  we  used  the  default 
compiler  options  in  one  set  of  runs  (columns  without  any  compiler  flags),  and  one  or 
more  runs  with  various  optimization  options  for  the  compiler.  The  conventional  wisdom 
in  the  scientific  computing  community  on  compiler  optimization  is  that  it  can  change  the 
results.  While  we  are  aware  that  this  does  happen  and  one  must  be  careful,  it  is  our 
opinion  that  any  such  changes  beyond  the  cumulative  truncation  errors  is  due  to  some 
bug  in  the  compiler,  bug  in  the  code,  poor  code  design,  or  the  presence  of  some  sensitive 
dependence.  If  it  is  a  case  of  sensitive  dependence  then  it  presents  serious  problem  to  the 
numerical  methods.  In  any  case,  it  is  prudent  to  be  cautious  in  this  area.  For  the  test 
cases,  the  results  are  independent  of  the  optimization.  All  calculations  were  performed 


21 


with  8-byte  real  numbers,  single  precision  on  the  Cray  and  double  precision  on  the  other 
machines.  All  the  optimization  levels  attempt  to  maximize  the  execution  speed,  the  flag  - 
Zp  on  the  Cray  YMP  does  not  attempt  scalar  optimization. 


Abbot2 

Gateway  2000 

Pentium  120  MHz 

Processors:  1 

Memory:  16MB 

Windows  95 

MS  Powerstation  4.0 

Est.  Price:  $1,800 

Abbot 

Gateway  2000 

Pentium  120  MHz 

Processors:  1 

Memory:  64MB 

RedHat  Linux  3.0.3 

F2c  with  gcc  2.7.2 

Est.  Price:  $2,500 

Math 

Sun  Microsystems 

Ultra  2  200  MHz 

Processors:  2 

Memory:  764MB 

SunOS  5.5.1 

Sun  f77  4.0 

Est.  Price:  $80,000 

OSCA 

Cray  Research 

Cray  YMP  8/1 28E 

Processors:  8 

Memory:  1024MB 

UNICOS  9.0.2 

Cray  cf77  6.2 

Compiler  Flags 

/G5  /Ox 
/Ob2 

-m486  -03 
-ffast-math 

-last  -XU4 
-xparallel 
-xarch  = 
vSplus 

-Zp 

-u  iniinej 

-0  scalar3 
-0  task3 
-0  vector3 

Ngwou  2d 
100x100x200 

196.09 

79.10 

298.68 

137.52 

239.53 

22.14 

35.81 

14.31 

14.76 

Megaflops 

5.48 

13.58 

3.60 

7.81 

4.49 

48.53 

30.01 

75.09 

72.80 

Relative  Speed 

1.000 

2.479 

0.657 

1.426 

0.819 

8.857 

5.476 

13.703 

13.285 

Ngwou  Id 

4500x800  pre¬ 
factored 

62.89 

45.91 

105.74 

54.55 

68.09 

6.15 

7.39 

7.57 

7.41 

Megaflops 

11.95 

16.36 

7.10 

13.77 

11.03 

122.16 

101.66 

99.24 

101.39 

Relative  Speed 

1.000 

1.370 

0.595 

1.153 

0.924 

10.226 

8.510 

8.308 

8.487 

Ngwou  Id 
4500x800  not  pre¬ 
factored 

83.54 

66.30 

134.13 

73.69 

82.12 

11.18 

15.58 

15.65 

15.71 

Megaflops 

10.29 

12.96 

6.41 

11.66 

10.47 

76.87 

55.16 

54.91 

54.70 

Relative  Speed 

1.000 

1.260 

0.623 

1.134 

1.017 

7.472 

5.362 

5.338 

5.318 

Whetstone  (scalar) 

16.69 

9.01 

13.78 

10.99 

9.77 

2.65 

IBIQ 

11.21 

4.82 

Relative  Speed 

1.000 

1.852 

1.211 

1.519 

1.708 

6.298 

1.492 

1.489 

3.463 

Mflops 

5.84 

11.34 

5.70 

7.04 

6.99 

39.85 

2.30 

19.10 

270.16 

Relative  Speed 

1.000 

1.942 

0.977 

1.205 

1.196 

6.823 

0.393 

3.271 

46.261 

Table  2.  Computer  performance 


In  Table  2,  the  row  labeled  Nwogu  2d  gives  the  CPU  time  in  seconds  taken  by  our  two- 
dimensional  code  with  100  x  100  grid  points  for  200  time  steps.  Next  row  labeled 
Megaflops  (=  Number  (in  Millions)  of  floating  point  operations  per  second)  is  obtained 
by  using  the  number  of  floating  point  addition,  multiplication  and  reciprocals  given  in  the 
output  of  the  hpm  command  on  the  Cray  YMP.  Next  row  gives  the  relative  speeds.  The 
Boussinesq  code  requires  the  solutions  of  tridiagonal  systems  of  linear  equations  during 
each  computational  step.  The  matrix  is  fixed  and,  therefore,  can  be  pre-factored.  This 
pre-factoring  has  not  yet  been  implemented  in  our  two-dimensional  code.  The  next  six 
rows  in  the  table  give  the  CPU  times  for  the  one-dimensional  code  (4500  grid  points  and 
800  time  steps)  with  and  without  this  pre-factorization  of  the  tridiagonal  matrix.  The  next 
two  rows  give  results  of  the  standard  non-vectorizable  Whetstone  benchmark.  The 
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Mflops  for  the  scalar  machines  Abbot,  Abbot2  and  Math  gives  the  “megaflops”  for  a 
subroutine  call  and  the  average  of  one  addition  and  a  multiplication.  For  the  Cray  YMP, 
the  first  Mflops  number  is  this  scalar  number,  the  second  Mflops  number  is  the  result  of  a 
partially  vectorizable  loop:  a(i)  =  a(i  - 1)  +  b*c(i),  and  the  third  Mflops  number  is  due  to  a 
call  to  the  fully  vectorizable  machine  optimized  subroutine  SAXPY.  This  third  Mflops 
number  should  be  very  close  to  the  peak  floating  point  speed  (per  CPU)  on  the  Cray. 
HoAvever,  due  to  necessity  to  solve  the  tridiagonal  system,  the  second  number  provides  a 
more  realistic  assessment  of  the  performance  gain  expected  for  the  Boussinesq  code. 
Note  that  pre-factorization  improves  the  execution  speed  by  nearly  a  factor  of  two.  Note 
that  the  degree  of  improvement  due  to  compiler  optimization  varies  widely  with  the 
compiler,  in  fact  there  is  no  improvement  on  the  Cray  for  the  one-dimensional  code. 
There  is  generally  more  improvement  in  the  two-dimensional  code  that  the  one¬ 
dimensional  code.  The  compiler  optimization  converts  the  doubly  subscripted  arrays  to 
singly-subscripted  arrays  for  part  of  the  improvement.  The  vector  architecture  of  the 
Cray  supercomputers  does  not  provide  large  gain  for  this  class  of  problems.  With  the 
help  of  Dr.  Zeki  Demirbilek  we  made  some  test  runs  on  the  Cray  C90.  Finally,  this  table 
(possibly  in  conjunction  with  other  standard  benchmarks  such  as  SPEC_fp)  can  be  used 
to  estimate  execution  times  of  the  Boussinesq  codes  in  most  machines. 


Resource  requirement  of  a  sample  field-scale  simulation: 

We  conclude  this  section  with  the  estimate  of  the  resource  requirements  of  a  field-scale 
simulation.  Note  that  the  size  of  the  domain  that  can  be  simulated  depends  on  the 
wavelengths,  the  domain  size  increases  as  the  wavelength  increases.  Thus  a  better 
measure  of  the  size  of  the  domain  is  the  “number  of  waves”  contained  in  the  domain. 
Here  we  consider  the  most  likely  wind  wave  and  a  typical  beach.  Let  us  consider  an 
idealized  beach  with  uniform  slope  of  1/50  and  a  monochromatic  wave  of  period  10  sec. 
The  wavelength  of  a  wind  generated  wave  of  period  10  sec  is  156  m  in  deep  water.  For 
this  wave,  water  depth  larger  than  78  m  is  considered  deep.  So  we  take  the  simulation 
domain  to  begin  3750  m  off-shore  at  a  water  depth  of  75  m  and  end  at  1 50  m  off-shore  at 
a  water  depth  of  3  m.  The  wavelength  varies  from  156  m  at  the  deep  end  to  53. 1  m  at  the 
shallow  end.  The  resolutions  dx  =  2  va,  dy  =  3m,  dt  =  0.05  s  are  reasonable,  though  one 
can  use  coarser  grid  at  the  deep  end,  a  finer  grid  may  be  necessary  at  the  shallow  end. 
With  this  set  of  parameters,  we  have  NX  =  1804,  NY  =  1004  and  NT  =  12,000  for  a  3.6 
km  by  3  km  section  of  the  ocean  and  10  min  simulation.  The  memory  required  for  this 
simulation  is  approximately  675  MB,  larger  than  the  maximum  in  a  typical  PC,  but  well 
within  the  range  for  a  typical  high-end  workstation.  Using  table  2,  we  estimate  the  run 
times  to  be  72  hours  in  a  Sun  Ultra  II,  44  hours  in  a  Cray  YMP  and  24.5  hours  in  a  Cray 
C90.  Note  that  approximately  10-12  such  simulations  can  run  simultaneously  in  a  Cray 
C90.  Based  on  other  benchmarks  available  on  the  Internet,  we  estimate  the  run  times  to 
be  25  hours  on  a  top  of  the  line  Digital  AlphaServer  8400  5/440  or  approximately  12-15 
hours  on  a  Cray  T90.  We  conclude  that  it  is  feasible  to  use  the  Boussinesq  type  model  to 
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simulate  the  wave-climateology  of  sufficiently  large  (to  be  of  practical  military  interest) 
sections  of  ocean  using  currently  available  computing  resources. 

The  resource  requirements  of  the  previous  paragraph  pertain  to  the  Phase  I  prototype 
code.  We  anticipate  that  the  improved  numerics  in  Phase  II  will  significantly  reduce  the 
run  times.  In  fact  a  doubling  of  the  speed  is  expected  from  pre-factorization  alone. 
Combined  with  the  expected  advances  in  computers,  a  speed  gain  by  at  least  a  factor  of  5 
by  the  end  of  Phase  II  seem  to  be  reasonable.  We  also  expect  that  the  optimization  efforts 
will  reduce  the  memory  requirements  and  increase  the  speed.  Lastly,  any  success  in 
parallelizing  the  code  can  lead  to  significant  speed  improvements  in  the  massively 
parallel  computer. 

We  believe  that  the  Phase  I  work  and  the  extensive  research  in  this  area  clearly 
demonstrate  the  feasibility  of  using  Boussinesq  type  models  to  simulate  nonlinear  three- 
dimensional  near  shore  water  waves  in  field  scale  size  domains  using  current  computers. 
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Appendix  A 


In  this  Appendix,  we  provide  a  brief  review  of  the  numerical  scheme  proposed  by  Wei 
and  Kirby^^  to  solve  the  equations  of  the  Boussinesq  type  model  of  Nwogu^®.  Let  r]  be  the 
surface  elevation,  h  be  the  local  water  depth,  u,  v  are  the  x-,  y-components  of  the  velocity 
at  the  water  depth  Za-  Wei  and  Kirby^^  rewrite  the  equations  as: 


rit  =  E{t},u,v)  (1) 

Ut^F{r],u,v)  +  [F^{v)]^  (2) 

Vt  =  G{r],  M,  v)  +  [Gi  (u)]^  (3) 

where 

U{u)  =  u  +  h  [bi  huxx  +  b2  {hu)^J  (4) 

V{v)  =  v  +  h  [bihvxx  +  b2{hv)^J  (5) 


are  treated  as  simple  variables  in  the  time  stepping  scheme,  and, 

E{r],  [{h  +  ri)u]^  -  [{h  +  r))v]y 

“{^1^  (^2ca;  ^xy)  “I” 

^d-lh  {Vyy  dxy)  “h  Oi2h  \(^hv^yy  4“  (^hlCjxy^^y 
=  -[{h  +  r])u]x  -  [{h  +  ri)v]y 

hx(dxx  4~  ^xy)  4“  ciih  {u  XXX  4"  Vxxy^ 

+  2a2hhx  [{hu)xx  +  {hv)xy]  +  ci2h^  [{hu)xxx  +  (hv)  sij/]} 


■^3(21 ^y  (^yy  "^xy)  d^h  (*^yyy  4*  "^xyy) 

■4"  2(i2hhy  [(h'y)yy  “I"  (hu)a;y]  ”t"  02^  "1“  (b) 

F{r],  u,  v)  —  -gr]x  -  {uux  +  vuy)  (7) 

G{r],  u,  v)  =  -grjy  -  {vvy  4-  uvx)  (8) 

Fi{v)  =  -h  [6iht;a;y  +  b2{hv)^y  (9) 

Gx{u)  =  -h  ^bihuxy  +  b2{hu)^y  .  (10) 

The  constants  01,02,61,62  are  given  by 

oi  =  /3V2-1/6,  02  =  /3  +  1/2,  6i  =  /52,  62=^  (11) 

where  (3  =  Za/h,  and  g  is  the  gravitational  acceleration. 
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These  equations  are  solved  numerically  by  a  finite  difference  method.  They  are  dis¬ 
cretized  on  a  uniform  grid  as:  x  =  =  nA^  with  i  =  = 

0,  •  •  • ,  ATy,  n  =  0,  •  •  •  Nt-  Suppressing  the  time  dependence,  i.e.  the  index  n,  we  have: 

Ei,j{v,u,v)  = 

J  {h  +  r})i+ijUi+ij  -  {h  +  j  {h  +  -  {h  +  ■n)ij-iUij-i 


V 

I 


2dx 


V 


2dy 


-  aihtn 

+ 


I 

3  /  ~  -f-  2Ui-ij  —  Ui-2,j 


2dx^ 

Vi+l,j+l  -  Vi+l,j-l  -  -  VjJ-l)  +  Vi-l,j+l  -  Vi-l,j-l 

2dx‘^dy 


) 


^a2ni,j[nx)i,j\  ^^2 


-I- 


1 

(hu)i+ij+i  —  {hv)i+ij-i  -  {hv)i-i^j+i  -b  {hv)i-.ij-i 


Adxdy 


"  1 - 2*5 


(12) 


-I- 


{hv)i+ij-\-i  —  (fiu)i-i-i 2{hv)ij+i  2(hv)ij—i  -I-  {hv)i—ij^i 


.(fi  ). 


2dx‘^dy 

o_  1.2  /u  \  J  *^*,5+1  ~  2Uij  -|- Uij_i  ^  Uj-I-I —  l  —  U.j_i 4- Uj—lj  —  l 'I 


dy^ 


-I- 


j-t-2  '2Vi,j+l  +  Vi,j-2 

1  2dy= 

•^i-i-ij-H  -  -  2('Ui+i j  -  Uj-ij)  +  Uj+ij-i  -  Uj-ij-i 

2dy^dx 


] 


•^r,  h  (h  \  /  -f  {hv)ij-i 

-  Za2niJ[lly)i,j<^  - ^^2 - 


-I- 


1 

(Au)i+i,j+i  ~  {hu)i+i,j-i  -  {hu)i-ij+i  -f-  {hu)i-ij- 


Adxdy 


i-i,j-i  I 


-  02  A: 


2  /  {hv)i,j+2  2(/tu)jj-|.i  -f  2{hv)ij—i  {hv)ij—2 


+ 


\  2dy^ 

(/m)i-Hj-i-i  —  (hu)!—! j-i-i  —  2{hu)i.^.ij  —  2(/tu)t_ij  -1-  {hu)i^ij—i  —  {hu)i—ij—i 


2dy^dx 


] 


where 


/L  \ 

“  2dx 

(13) 

/■I,  ^ 

~  2dy 

(14) 
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The  spatial  discretization  of  t/,  V  are  given  by: 


d"  bihij 


{hu)i^ij  -  2{hu)ij  +  {hu)i--i 


h  '  I  1  r  2/^?  • 

^  hhij  +  b2hi-ij  Ui-ij  +  1  -  +  ^2)  Ui,j 

+  bihij  +  b2hi+ij  Ui+ij 


+  Ki  [i'.'xj  (— 

*  "  ~  2/lr^ 

~  “I"  — 1  ^i,j  — 1  +  1  ^y2 


(/it))ij+i  2{hv)ij  4-  {hv)ij--i 


All  terms  in  equation  (15)  have  the  same  index  j,  and  thus  it  can  be  treated  as  a  linear 
system  connecting  u  and  U  via  the  tridiagonal  matrix  .  The  quantity  U  is  treated  as  a 
simple  variable  in  the  time  stepping  scheme,  i.e.  its  values  are  obtained  directly.  The  values 
of  u  are  then  obtained  by  solving  Ny  tridiagonal  equations.  Note  that  the  matrix  is  constant 
and  involves  only  one  dimension.  The  boimdary  terms  have  a  slightly  different  form,  that 
can  be  handled  either  by  redefining  the  matrix  or  by  the  right  side  of  the  equation,  we 
redefine  the  right  side.  Similar  remarks  apply  for  equation  (16).  The  remaining  quantities 
are  given  by: 


Fi,j{v,u,v)  = 


Vi+lJ 


Uij 


(  Ui+l,j  Uj-ij  \  _  I  ^i,j+l 
\  2dx  J  \  2dy 

^i-2,3  ~  +  ^Vi+l,j  ~  Vi+2,j\ 

12dx  J 

(  Wi-2,i  —  8Ui_i  J  +  SUi+i  J  —  Ui^2,j  \ 

V  12rfa;  ) 

V  12dy  )  ’ 


- 
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Gi,j{r],u,v)  =  -g 


-  ^i,3 


ViJ-l  Viyj-1 
2dy 


2dy 


u 


\  2dx 


0 


=  -9 


'9i,j-2  -  +  ^Vi,j+1  -  'ni,j+2'\ 

12dy  J 

-2  -  8Vij-i  +  SVij+i  -  Vij+2 


-  “ij 


12dy 

,j  -  8Vi-iJ  +  SVj+ij  -  Vi+2,j 
12dx 


/ 

')■ 


{Fi{v))ij  —  hij 


+  k 


ihij 


(Gi(u))ij  —  hij  bihij^- 


+  b: 


(18) 


(19) 


' Vj+lJ+l  -  Vi+l,j-l  -  Vi-l,j+l  +  Vi-l,j-l\ 

Adxdy  J 

'{hv)i+i,j+i  -  (/tu)i+ij-i  -  {hv)i-ij+i  +  {hv)i-ij-i^ 

Adxdy  ) 

'  ^i+lj+l  -  Ui+l,j-l  -  Uj-ij+i  +  Uj-ij-i  \ 

Adxdy  J 

'  {hu)i+ij+i  —  {hu)i+ij-i  —  {hu)i-ij^i  +  {hu)i-ij-i 

Adxdy 

We  have  used  both  (selected  by  a  flag)  the  three-point  and  five-point  formula  in  equations 
(17)  and  (18). 


(20) 


The  time  stepping  scheme  proposed  by  Wei  and  Kirby^^  is  a  Predictor-Corrector  sc¬ 
heme.  The  predictor  step  is  the  third  order  explicit  Adams-Bashforth  scheme,  given  by; 


Vi,t'  =  V?.j  +  ^  [iSFT.i  -  16f’7‘  + 

+  2(F,)rj  -  3(F,)"T'  +  (22) 

+  2(Gi)"j  -  3(Gi)?-'  +  (G,)?-^  (23) 

The  corrector  step  is  the  fourth  order  explicit  Adams-Moulton  scheme,  given  by: 

‘  ^  p4) 

^  ‘ + 

+  (F,);^t'  -  (F,)")j,,  (25) 

'77'  =  ^  -  6G"7'  +  Gif] 

+  (Gl)lf  -  (G,r)i.j  (26) 
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The  scheme  is  applied  as  follows.  We  assume  that  the  dependent  variables  77,  u,  v  are  known 
at  three  successive  time  steps:  n  -  2,  n  -  1,  n.  All  terms  on  the  right  sides  of  the  equations 
(21  -  24)  can  be  calculated  using  these  values  and  equations  (12  -  20).  This  gives  the 
“predicted”  values  of  r],U,V  at  the  next  time  step  n  +  1.  The  values  of  u,v  are  then 
obtained  by  solving  the  tridiagonal  systems  in  equations  (15,  16).  In  the  corrector  step 
equations  (24  -  26),  the  next  time  step  n  +  1  occurs  on  both  sides  of  this  equation  i.e. 
this  is  an  implicit  equation  which  is  solved  by  iteration.  The  “predicted”  values  obtained 
from  the  predictor  equations  (21  -  23)  gives  the  starting  point  of  this  iteration.  The  values 
at  k  —  th  stage  of  the  iteration  is  used  in  the  right  side  of  equations  (24  -  26)  to  obtain 
the  values  at  stage  fc  +  1.  The  iteration  stops  if  convergence  is  achieved  or  the  number 
of  iteration  exceeds  some  preset  value.  The  convergence  criteria  is  that  either  the  relative 
error  in  norm  or  the  is  less  than  some  prescribed  tolerance  e  for  each  of  77,  u,  v.  This 
procedure  can  be  repeated  for  the  following  time  step. 

This  is  multi-level  scheme.  The  successive  time  steps  are  required  to  start  the  scheme. 
The  0  —  th  step  is  given  by  the  initial  condition.  As  in  other  multi-steps  schemes,  a  single 
stem  scheme  such  as  the  Euler  scheme  is  used  to  obtain  the  next  two  time  steps.  However, 
use  of  Euler  scheme  is  not  necessary  in  this  problem.  This  is  due  to  the  fact  that  we  begin 
with  still  water  i.e.  77,  u,  v  are  identically  zero  at  i  =  0.  It  is  easily  seen  that  this  condition 
can  be  assumed  for  the  first  three  time  steps  without  any  loss  of  generality. 

We  note  that  the  program  must  store  the  values  for  three  successive  time  steps  as  well 
as  two  sets  of  values  needed  in  the  iteration  stage  of  the  corrector  scheme.  If  one  wishes  to 
“restart”  a  simulation,  the  variables  at  three  successive  step  must  be  stored  at  the  end  of 
first  run  and  retrieved  at  the  start  of  next  run. 

Finally,  we  note  that  the  formulae  for  the  one-dimensional  case  can  be  obtained  from 
the  two-dimensional  formula  by  reduction.  To  obtain  the  one  dimensional  equations  we 
set  v,V,G,Fi,Gi  to  zero,  the  y  derivatives  to  zero,  and  all  discretized  quantities  to  be 
independent  of  j  in  equations  (1  -  26). 
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